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In a related study, factor analysis was applied xq 
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with varying profiles, while. the other two were composed of new and 
developing schools. Of the remaining three clusters, t,wo were 
predominantly private schools and one was an equal mix of public and 
private schools. Each cluster was also described in terms of 
variables selected from the original data. (Author/MSE) 
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EXECUTIVE SUMMARY 



This report, An Empirical Classification of U.S. Med- 
ical Schools by Institutional Dimensions / describes one 
of five studies performed by the Association of American 
Medical Colleges (AAMC) in 1976 examining the character- 
istics of U.S. medical schools and the interrelationships 
among the schools and among variables that describe them. 
Two of the five studies were replications of earlier work. 
The other three studies, including this one, used multi- 
variate statistical methods — factor analysis, cluster anal- 
ysis, and multidimensional scaling — to explore the extensive 
body of data on the institutions maintained by AAMC in the 
Institutional Profile System (IPS). In 1976, factor analy- 
sis was applied to reduce a selected set of variables to 
their principal dimensions. The variables used represented 
the data found most interesting in earlier studies and new 
data which showed a potential for revealing interesting 
new areas of institutional variability. The results of 
the factor analysis were then used as input to a series ; 
of multivariate cluster analyses which isolated clusters 
of medical schools that were similar to each other and 
different from schools in other clusters on the dimensions 
depicted by the factor analysis. 

The original data on which the study is based were 
selected from the more than 8,000 data elements in IPS. 
A total of 140 variables were selected from four categories 
of measures: (1) institutional, (2) student, (3) faculty, 
and (4) curriculum. Through a series of correlational 
studies this variable set was reduced to 33 variables which 
represented the most complete, representative, and inter- 
esting data available. The 33 variables were factor ana- 
+ lyzed, eight factors were rotated using a varimax criterion, 
and factor scores were computed on the eight factors for 
110 medical schools. 

The cluster analysis described in this report was per- 
formed in two stages. Initially, the 110 schools were 
clustered hierarchically using a technique developed by 
Ward. The result of this analysis was used as input to a 
non-hierarchical cluster analysis to refine the final 
groupings of schools. A number of combinations of factor 
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scores and numbers of clusters were produced and the result- 
ing clusters compared. A final solution of eight clusters 
based on six factor scores was selected reflecting the best 
groups of medical schools on the most meaningful dimensions. 

The eight clusters in the final solution each had dis- 
tinctive profiles on the six factor scores. There were five 
clusters which consisted completely or predominantly of 
public schools. Three of these clusters consisted of estab- 
lished schoosl with varying profiles, while the other two 
were composed of new and developing schools. Of the remain- 
ing three clusters, two were predominantly private schools 
and one was an equal mix of public and private schools. 
Each cluster of schools was also described in terms of var- 
iables selected from the original data which was factor 
analyzed. This information provided an added dimension of 
distinctiveness to the clusters described in the study. 



Chapter I 



INTRODUCTION 



This report describes the third in a series of studies 
performed by the Association of American Medical Colleges 

(AAMC) in which multivariate cluster analysis was used to 
group medical schools on the basis of quantitative data 
contained in the Association's Institutional Profile System 

(IPS) . The purpose of the series of studies is to empiri- 
cally derive groups or clusters of medical schools such 
that the schools in each cluster are similar to each other 
and different from schools in other clusters. The basis on 
which the clusters were formed included several 
measurable aspects of the institutions such , as general 
institutional/ financial , faculty , student, and curricular 
characteristics. In other words , the goal of the analyses 
was to isolate groups of medical schools which are similar 
to one another on a number of dimensions. 

Previous AAMC Cluster Analysis Studies 

The first AAMC cluster analysis study. Classification 
of Medical Education Institutions (Nunn and Lain, 1976) , 
was performed in 1975. That study was patterned after one con 
ducted by the Rand Corporation in 1972 (Keeler, et al, 1972). 
In the Rand study, 31 institutional variables for 9T~U.S. 
medical schools were factor analyzed, six factors were 
rotated, and factor scores for each factor were generated 
for the 94 schools. These schools were then formed into 
10 groups using cluster analysis. In the 1975 AAMC study, 
23 of the 31 variables used in the Rand study were factor 
analyzed. Using data primarily from 1973-74, six factors 
were extracted from the 23 variables, the factors were 
rotated, and factor scores were calculated for 99 medical 
schools. These 99 schools were then clustered into 16 
groups based on their similarities on the six factor scores. 
In the 1975 AAMC study the cluster analysis was performed 
in two stages. First, a hierarchical cluster analysis was 
used to assess the number of clusters and potential cluster 
centers. The second step was to use a non-hierarchical 
cluster analysis to refine the membership of the 16 clusters. 

The 1975 AAMC cluster analysis study was replicated one 
year later (McShane, 1977a) using the same variables and 
essentially the same methods. There were two principle 
differences between the 1975 study and the 1976 replication: 
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(1) the data used in the replication were primarily from 
1974-75, and (2) due to the availability of more complete 
data, 109 schools were included in the cluster analysis. 
In both the factor analysis and the cluster analysis, a 
number of similarities in the results from the two years 
were found. However, the discrepancies in the findings 
and the apparent increased clarity of the replication 
results seemed to indicate that further development of the 
methods and further exploration of the data would increase 
understanding of similarities and differences among medical 
schools. 

Overview of the Present Study 

As a result of the 1976 cluster replication, ^ a number 
of recommendations were made for further studies in this 
area. Among the recommendations were the following: 

1. The selection of variables should be altered 
to include new variables which would describe 
a research emphasis dimension on which 
medical schools could be compared. 

2. Special attention should be paid to the 
"control" (public vs. private) dimension, 
and a way should be sought to either 
eliminate or statistically control the 
effects of this dimension. 

3. The number of clusters should be determined 
through the analytic process rather than 
specified a priori . 

4. The changes in the membership of the clusters 
over time should be examined to ascertain 
whether there are soire schools which group 
together in several studies . 

5. The potential for basing the cluster 
analysis on the original data as opposed to 
factor scores should be given further — ... 
consideration, and the effects of missing 
data and outlying schools on the analysis 
should be investigated. 

All of the recommendations listed above were taken into 
consideration during the course of the study described in 
this report. A new set of 33 variables, including 7 
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variables used in the previous clustering studies and 26 
additional variables, were selected through a series of 
factor analyses (Sherman, 1977b) . These 33 variables were 
factor analyzed, and 8 factors were extracted and rotated. 
Factor scores were then computed on the eight factors for 
each of the 110 medical schools that had data for more than 
80 percent of the variables. 

In the cluster analysis stage of7*the study, the second 
and third recommendations listed above were incorporated. 
The effects of the control dimension on the solution were 
taken into account through the selection of the factor 
scores used in the cluster analyses. Since factor scores 
represent independent composite measures of dimensions on 
which medical schools vary, they replace the raw data as 
the basis for the cluster analysis. As such, one or more 
of the factor scores may be deleted frpm_the analysis and 
the effect of removing the variables may be assessed*- 

A nuraber of combinations of factor scores were used as 
input in a hierarchical cluster analysis, and the effects 
of the exclusion of variables on -he resultant clusters were 
assessed in these solutions. In addition, there was no 
preconception of the number of clusters which would emerge 
from the analysis. The number of clusters was determined 
by the analysis, of the data at hand and by comparing 
solutions invbSs/ing varying numbers of clusters. Finally, 
the memberships of the clusters were refined by using one 
school from each cluster in the hierarchical solution as a 
starting point for a non-hierarchical cluster analysis. In 
the non-hierarchical cluster analysis, a number of solutions 
involving varying numbers of clusters were derived. The 
solutions which optimally satisfied the criterion of 
minimizing differences among the schools in each cluster 
while maximizing the differences among the clusters are » 
presented in this report. 
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Chapter II 
M E T H O D 



The study described in this report was conducted in 
five stages; (1) selection of variables, (2) factor 
analysis, (3) computation of similarities, (4) hierarchical 
cluster analysis, and (5) non-hierarchical cluster analysis. 
In this chapter each step in the analysis will be described. 
Further explication of the first two steps in the analytic 
process can be found in a companion report by Sherman 
(1977b). 

Selection of Variables 

AAMC's Institutional Pfofile System (IPS) is the 
repository for most of the institutional data collected by 
the Association. In August, 1976, there were over 8,000 
data elements from over 60 different sources in IPS. Many 
of the data were longitudinal repetitions of the same 
variable for as many as 15 years (1959-60 through 1974-75) • 
The data in IPS come from a number of different kinds of 
sources. The major sources are annual surveys such as 
Parts I and II of the Liaison Committee on Medical Educa- 
tions (LCME) Annual questionnaire which provide a wealth of 
information on medical school finances and detailed counts 
of students, faculty, and facilities; the Fall Enrollment 
Questionnaire which provides additional student counts; 
and information on types of programs and electives gathered 
to be published in the AAMC Curriculum Directory. Additional 
data are taken from special-purpose surveys and questionnaires, 
such as the 1973 Health Services Delivery and Primary Care 
Education questionnaire, the 1975 AAMC questionnaire on student 
affairs resources, and the 1973 questionnaire on medical 
school facilities; other AAMC information systems such as 
the Faculty Roster System (FRS) , the Medical School Applicant 
file, and the Medical Student Information System (MSIS) ; and 
other organizations' information systems such as AMA's Medical 
School Alumni file and the IMPAC file maintained by the 
Division of Research Grants (DRG) of the Department of Health, 
Education, and Welfare (DHEW) which contains information on 
grant applications to NIH and selected other agencies within 
DHEW. All of the data transmitted from other AAMC informa- 
tion systems and other agencies are aggregated by institution 
prior to being stored in IPS. 
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To facilitate the use of data from IPS in the studies 
using institutional data, a Researchable Data Base was 
constructed. Data elements were selected for inclusion in 
the Resrarchable Data Base if they were the most recent: 
repetition of a particular variable and were potentially 
useful in one or more of the studies specified in the 
contract. A total of 3 99 variables, including institutional, 
facun-y, student, and curriculum measures, were transr erred 
from IPS to the Researchable Data Base. In addition, ^01 
variables were computed from the original data and 
stored in the Researchable Data Base. The computed 
variables described attributes of the medical schools within 
(e.g. the percentage of females among undergraduate medical 
students) and across (e.g. the ratio of undergraduate medical 
students to full time medical school faculty members) the 
four categories noted above. A complete discussion i ot the 
1976 IPS Researchable Data Base and a list of the variables 
included may be found elsewhere (McShane, 1977b). 

From the total of 600 variables in the Researchable 
Data Base, 139 were selected for consideration in this study. 
A -series of correlational studies was conducted within each 
of the ad hoc categories described above to select a txnal 
set of variables which had recorded values for nearly Jll 
schools and were representative oi the principal dimensions 
within each of the categories. The final set of variables 
factor analyzed and used to produce the factor scores which 
were the basis of this study are presented in Table 1. (A 
Glossary of abbreviations is presented in Appendix A) . From 
the information presented in Table 1 it is evident that 17 
of the 33 variables used in the final factor analysis on 
which this study was based were new variables. These new 
variables were either not available for earlier studies in 
the series, or replaced similar variables for reasons of 
completeness or representativeness discussed above, in 
addition, since part of the intent of Sherman- s U 9 !™ study 
was to expose previously undisclosed relationships among 
variables, when two variables were approximately equivalent 
in completeness and representation, a previously unused 
variable was selected over one used in earlier studies. 

The final set of 33 variables contained 14 student 
variables, 13 institutional variables, 4 faculty variables, 
and 2 curriculum variables. There are a number of "jsons 
for the disproportionate selection of variables from the four 
categories. First, most of the data m IPS are either 
institutional or student descriptors. Secondly, the JUrricu 
lum data in IPS are predominantly qualitative and as such are 
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TABLE 1 

VARIABLES USED IN FACTOR ANALYSIS OF 
INSTITUTIONAL DATA, 1976 



VARIABLE 



VAR388 AV SALARY - SFT ASSOC PROF BASIC SCIENCE 

STC043 RAT: HOUSESTAFF TO UNDERGRAD MD-STUD ' 

INC058 RAT: MD STUDENTS TO FT FAC 3 

STC105 % LIVING MD-ALUMNI IN GENERAL PRACTICE 

FAG 00 3^6 PT .& FT SAL FAC WITH MD 

VAR016 # MD-STUDENTS 1 ' 2 

INC04 8 LOG AGE OF MEDICAL SCHOOL 1 

STC112 % LIVING MD ALUM BOARD CERTIFIED 

VAR002 CONTROL: 0 = PUBLIC, 1 = PRIVATE 1 * 2 

VAR394 1975-76 RESIDENT MD-STUDENT TUITION 

STC029 % IN-STATE 1ST-YR MD-STUD 1 

STC084 RAT: APPLICANTS PER 1ST-YR MD-STUD 2 

INC00 7 % REV FROM FED SOURCES & RCOV IND COSTS 

INC012 % REV FROM ALL GIFTS 

STC082 % UNDERREP MINORITY 1ST-YR MD-STUD 

FAC004 % PT & FT SAL FAC FROM ETHNIC MINORITIES 

STC008 % NON US-CANADIAN 1ST-YR MD-STUD 4 

VAR093 1ST-YR MD-STUD: MEAN MCAT SCIENCE SCORE 2 

INC04 0 NIH-NIMH R01 $ AWARD AS % OF $ APP SBMITTED 

VAR352 IMPAC: MEAN STD P-SCR - R01 APP 

INC04 5 IRG APPROVAL RATE OF NIH R01 COMP APPS 

STC00 3 % FEMALE MD STUDENTS 

VAR273 REL ELECTIVES: ALCOHOLISM 

CRC00 2 % OF RELATED ELECTIVES OFFERED 

FAC019 RAT: VOL FAC TO FT FAC 1 ' 2 

INC003 DRG FED SPON RES CON$ %CHG 67-9 to 72-4 2 

STC114 PROJTD ANNL % 1ST-YR ENROLL CHG: 1974-79 

VAR384 DRG GRANTS - # R01 APPS APPROVED 

INC026 % EXPD FOR ADMIN & GENL EXPENSE 

INC 01 7 % TOTAL EXPD FOR SPON RESEARCH 2 

STC045 RAT: BMS GRAD-STUD TO UNDERGRAD MD-STUD 1 

INC004 ADJUSTED TOTAL REVENUE 2 

STC013 % 1ST-YR MD-STUD: PRE-MED ^RA 3.6-4.0 2 



'Variable used in 1975 and 1976 cluster analysis studies 
(Nunn and Lain, 1975; McShane, 1977a) 

Variable used in exploratory analyses of the relations of 
institutional variables (Sherman, 1976 and 1977 a) t 

"The inverse of this variable, the ratio of full time faculty 

to the number of medical students was used in both 1 & 2. 

•A similar variable, the percentage of non-U. S. -Canadian 
medical students was used in 2 above. 
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of limited utility in studies of this type. Finally, 
computed variables which crossed categories (e.g. the ratio 
of medical students to full time faculty - INC058) were 
classified as institutional measures for the purpose of the 
development of the IPS Researchable Data Base. 

In addition, the final data set contained predominantly 
computed variables (ratios and percentages) rather than the 
original variables taken from IPS. Only 8 of the 33 
variables were IPS data elements; the other 25 measures 
were computed from IPS data. The reason for selecting 
predominantly computed variables was that computed variables 
allow for comparisons of emphasis rather than extensiveness 
and for illumination of institutional characteristics other 
than overall "size". 



Factor Analysis 

The second step of the analysis performed in the-course 
of this study was the factor analysis of the 33 selected 
variables described above. The data reduction technique 
actually employed was principal components analysis. One 
of the assumptions underlying most factor analytic technique* 
is that the variance in each variable in a set can be broken 
down into common variance -the variance shared by the otner 
variables in the set) and the variance that is unique to 
the particular variable. In principal components analysis, 
however, no assumptions are made about the str u ^ure under- 
lying the variables in the analysis. Instead, the variables 
are mathematically transformed so that the first component 
extracted accounts for as much of the variance in the data 
as possible and each subsequent component extracted accounts 
for as much of the remaining variance in the data as possible 
(Mulaik, 1972) .. In this manner it is possible to determine 
whether a large proportion of the variance in a set of 
variables can be explained by a relatively small number of 
dimensions (components) . 

In the current study, the first 9 components extracted 
accounted for 74.4 percent of the variance in the data. A 
number of varimax rotations were performed in which different 
numbers of the components, ranging from 9 down to 4 , were 
rotated. These six solutions were then compared, and the a 
component solution was selected as the. most mterpretable 
and intuitively appealing. The eight components were 
explained in somedetail by Sherman (1977b) and served as 
the basis for the cluster analyses described in this report. 
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Computation of Similarities 

An important conceptual step in conducting a cluster 
analysis, and one which is often transparent to both user 
and consumer, is the computation of indices of similarity. 
Since the goal of cluster analysis is to construct 'clusters 
containing objects that are as similar as possible, some 
measure of similarity (or its converse, dissimilarity or 
distance) is necessary. Measures of similarity include 
coefficients of association and correlation; measures of 
dissimilarity or distance include weighted and unweighted 
Euclidean distance coefficients, the "city-block" metric, 
and the Mahalanobis generalized distance coefficient. The 
various methods of computing similarity indices are 
discussed in many of the texts on cluster analysis including 
those by Anderberg (1973), Everitt (1974), and Bailey (1975). 

In this study, distances were computed between each of 
the pairs of schools using the Euclidean distance coefficient. 
For a given pair of schools, the Euclidean distance is equal 
to the square root of the sum of the squared differences 
between the two schools on each variable. One advantage of 
this type of distance coefficient is that it has an easily 
interpretible and unique zero point. The distance between 
two schools can be zero if and only if they have identical 
values on all variables. Negative distance is undefined and 
larger coefficients imply that schools are farther apart on 
one or more variables. 

It is important to note that in the computation of the 
Euclidean distance described above, each variable is equally 
important in determining the distance coefficient between 
pairs of schools. Important variables may be given added 
impact in an analysis by weighting those variables. Alterna- 
tively, variables which have little heuristic importance may 
be dropped from the analysis. 

Hierarchical Cluster Analysis 

The cluster analysis performed in this study was actually 
a two-step process. Initially, hierarchical cluster analysis 
was performed using a technique developed by Ward (1963) . 
The results of the hierarchical cluster analysis were then 
used to give indications of the number of clusters of schools 
present, based on the factor scores used as input, and the 
schools which could be used as starting points for the non- 
hierarchical cluster analysis. 
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Generally speaking, hierarchical cluster analysis is a 
class of empirical methods of forming objects into groups, 
through a series of stepwise merges. At first, each object 
is in a group of its own. Two groups are joined to form a 
larger group. Then, again, two of the remaining groups are 
merged. This continues until all objects are combined into 
a single group. At each step of the merging process, the 
two most similar of the groups are combined, and once 
combination has taken place the groups remain intact for the 
duration of the analysis. By forcing all objects to be 
combined, hierarchical cluster analysis allows for distortion 
of natural clusters by the inclusion of outlying objects. 

Ward's hierarchical cluster analysis method defines the 
distance between clusters as the distance between the centers 
of the clusters (the cluster centroids) and uses as its 
criterion the increase in the sum of the squared distances 
from the objects in the cluster to the cluster centroid. 
At each step of the analysis, the two clusters that cause the 
least increase in the sum of squared distances within clusters 
are combined. Stated another way, the Ward method attempts 
to minimize differences within clusters and maximize differences 
among clusters. 

In the study described in this report, 110 U.S. medical 
schools were hierarchically clustered on the basis of their 
values on 8, 6, and 5 factor scores. These three analyses 
were conducted to assess the impact of selected factor scores 
on the hierarchical solution, and specifically to determine 
whether the omission of the control factor would have benefi- 
cial results in the interpretation of the clusters. It should 
be noted that, unlike previous AAMC clustering studies, no 
variable was given disproportionate weight in determining the 
distance index between pairs of groups. 

Nong-Bierarchical Cluster Analysis 

\^he information provided by the hierarchical cluster 
analysis was used to initiate a refinement of the derived clusters 
through non-hierarchical cluster analysis. Non-hierarchical 
cluster analysis places all objects into a predetermined number 
of clusters in such a way that a specified criterion is opti- 
mized. This kind of procedure avoids the problem of objects 
necessarily remaining together once they have been combined 
and reduces the effects of outlying objects on cluster member- 
ships. However, in order to use a non-hierarchical cluster 
analysis it is preferable to have some idea of the number of 
clusters of groups of objects that exist based on the data 
at hand, and to be able to provide some indication of the 
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approximate location of the "centers" of the clusters. In 
this study the result of the hierarchical cluster analyses 
was used to provide a range of the number of clusters 
present and initial cluster "centers", one school from each 
cluster in the hierarchical solution, for the non-hierarchical 
cluster analysis. 

The non-hierarchical cluster analysis method used in 
this study was developed by Forgy (1965) and is known as the 
K-means technique. Using the number of clusters and cluster 
centroids specified by the user, each object is assigned to 
the cluster with the closest centroid. After all objects 
have been initially assigned to a cluster, new cluster 
centroids are computed for each cluster based on the objects 
assigned to the cluster. A cluster centroid is a point in 
p_ dimensional space (where p_ is the number of variables) 
defined by the mean of the objects in the cluster on each 
variable. The distance of each object from each of the 
cluster centroids is then computed and objects are reassigned, 
if necessary, to the cluster which now has the closest 
centroid. After the reassignment of objects, the cluster 
centroids are recomputed, and a new cycle of computing 
distances, reassigning schools and recomputing cluster 
centroids is begun. This cycle is repeated until no objects 
are reassigned after cluster centroids have been calculated. 
This procedure, like the Ward technique, minimizes the 
differences of objects within the clusters but without the 
artificial permanence of cluster membership inherent in the 
hierarchical approach. 

In this study several non-hierarchical, cluster analyses 
were performed using the Forgy method. Numbers of clusters 
ranging from 12 down to 6 were derived using both 5 and 6 
factor scores as input. From the variety of possible 
clusterings, an 8 cluster solution based on 6 factor scores 
was selected for presentation in this report based on its 
representation of the schools and their similarities. The 
rationale through which this solution was selected and a 

description of the clusters in terms of both the factor 

scores and the original variables are presented in the follow- 
ing chapter. 
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Chapter III 



RESULTS AND DISCUSSION 



The results of this study were derived at three 
different stages of the analytic sequence. The factor 
analysis and hierarchical cluster analysis each produced 
results which were utilized at later stages; and the non- 
hierarchical cluster analysis produced the final clusters 
of medical schools. The results of each step of the analysis 
will be presented in this chapter. 

Factor Analysis of 33 Measures of Institutional Characteristics 

As described in Chapter II , the first step in the analysis 
for this study was the factor analysis of 33 variables selected 
from IPS. The 33 variables were selected to represent 
several measurable aspects of medical schools in the U.S. 
including institutional/ financial/ faculty / student and 
curricular characteristics . 

The rotated factor pattern matrix which resulted from the 
factor analysis is presented in Table 2. The matrix was the 
result of a study by Sherman (1977b) and is discussed in 
detail in the report of that study. For the purposes of this 
report/ the factor pattern matrix will be interpreted only 
briefly. 

Factor 1 provides a means for assessing the graduate 
medical education program emphasis among medical schools. 
Schools which are strong in this area would typically have a 
high ratio of interns and residents to undergraduate medical 
students, proportionally more faculty who hold MD degrees, 
higher faculty salaries / and fewer undergraduate medical 
students per full time faculty member. Interestingly, schools 
with tnese qualities have in the past produced a relatively 
small proportion of graduates who went into general practice. 

Factor 2 measures the size and age of the medical schools. 
This factor bears out the common assertion that older schools 
tend to have greater numbers of undergraduate medical students 
and larger proportions of alumni who have achieved board 
certification. Secondary loadings on this factor indicate 
that older medical schools are experiencing less growth in 
enrollment and federally sponsored research funding than newer 
schools. While these findings are not particularly startling, 
it is interesting to note that these measures do form an 
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TABLE 2 



EIGHT COMPONENT VARIifiAX FACTOR PATTERN RESULTING FROM 
PRINCIPAL COMPONENTS ANALYSIS OF 33 VARIABLES 
DESCRIBING U.S. MEDICAL SCHOOLS, 1976 
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independent dimension empirically unrelated to the other 
seven factors derived in this analysis. 

Factor 3 measures the control dimension among medical 
schools. The variables which have their highest loadings 
on this factor are control (in which public schools were 
represented by a 1 0', private by a 'l'), and other variables 
which are related to the degree to which a school resembles 
public or private medical schools: resident medical student 
tuition, the percent of in-state medical students , the 
number of applicants per first year medical student/ the 
percent of the school's revenue which comes from federal 
sources, and the percent of revenue from gifts. Schools 
which have high values on this factor tend to resemble most 
Private schools in that they have relatively high resident 
tuition, few resident students, and high numbers of applicants 
per first-year medical student. These schools also tend to 
receive a greater proportion of their revenues from the 
federal government and from gifts than do schools which are 
more similar to public medical schools. 

Factor 4 assesses the medical schools' involvement with 
ethnic minority faculty and students. It is evident from the 
variables loading on this factor that schools with high 
proportions of ethnic minorities among their faculty and 
students and proportionally high enrollments of foreign 
medical students would have high values in the fourth factor. 
Closer inspection of the data revealed that the inclusion of 
data from two historically Black medical schools, Howard and 
Meharry, and the University of Puerto Rico probably had a 
great deal of influence on the emergence of this factor. 

Factor 5 measures the research funding success of the 
medical schools on applications for new single-investigator 
research (ROD grants from NIH. Schools with high 
approval rates also have the "best" priority scores (where a 
lower score reflects a higher priority) and are awarded 
a higher percentage of the sum of dollars requested on 
all reviewed R01 proposals. Interestingly, schools which 
possess these qualities also tend to have a relatively high 
proportion of female medical students. It is also interesting 
that this dimension of institutional differences is apparently 
independent of other measures of research emphasis which 
combined to form a separate factor. 

Factor 6 , which was formed by the only two curriculum 
variables in the variable set, measured the degree to which 
curriculum electives were used by the medical schools. 
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The isolation of these two variables indicates that the 
curriculum information available, in addition to being 
scarce and not readily amenable to studies of this type, is 
independent of other dimensions on which medical schools 
were observed to vary. 

Factor 7, which measures the developmental stage of 
the medical schools, illustrates the tendency of schools to 
grow simultaneously in all areas. The three variables which 
have their highest loadings on the seventh factor, one each 
from the student, faculty, and institutional domains, are 
all potential indicators of institutional growth. Thus, this 
factor may distinguish developing from established schools. 

The final factor, Factor 8 , measures the research^ emphasis 
of medical schools. The variables which have high loadings on 
these factors are primarily related to the extent and emphasis 
of sponsored research activity. Schools with a strong research 
emphasis have relatively high percentages of their budgets 
expended for sponsored research, large numbers of research 
grants approved, high ratios of basic medical science graduate 
s£udents P ?o undergraduate medical students , high Percentages 
of students with superior undergraduate grade point averages, 
and low percentages of expenditures for administration and 
general expenses. 

To summarize, the factor analysis of 33 variables selected 
from IPS to represent the complete range of medical school 
activities resulted in the following eight factors: 
(1) graduate medical education emphasis, (2) size and age , # 
3 control, (4) minority participation, (5 research funding 
success, (6) curriculum electives, (7) developmental stage, and 
(8) research emphasis. Only three of the factors, numbers 1, 
3, and 8, were similar to factors derived in earlier AAMC 
studies Sherman 1976 and 1977a; McShane, 1977a). Factor 
which was labelled "Size and Age" here, is similar in content 
to factors labelled "Undergraduate Medical Education else- 
where (Keeler, 1972; McShane, 1977a). Factors 4 through 7 
represent new dimensions of medical schools which have previous- 
ly been unexplored,, 

Hierarchical Cluster Analysis 

Based on the factor analysis described in the preceding 
section, eight factor scores were computed for 110 medical 
schools. Factor scores were computed for those schools which 
were missing values for less than 20 percent of the 33 variables 
(fewer than six variables). The amount of missing data allowed 
in this study was based on the proportion of missing data 
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allowed in other studies of this type (Nunn and Lain, 1976; 
McShane, 1977a).* The seven schools dropped from the analysis 
due to insufficient data at this point were Baylor University, 
University of North Dakota, University of Hawaii, Eastern 
Virginia Medical School, Wright State University, University 
of South Carolina, and the Uniformed Services University of 
the Health Professions, Only the first three of these schools, 
however, were in complete operation at the time the data on 
which this study is based were collected. 

The eight factor scores were used as input to Ward's 
hierarchical cluster analysis. Three separate hierarchical 
clusterings were performed based on different sets of factor 
scores. The effects of using different combinations of factor 
scores in analyses is similar to using various combinations of 
variables of any type in an analysis. The results of cluster 
analysis are inherently sensitive to the data on which the 
distances between pairs of schools are computed, and the 
resultant clusters may be very different when different 
variables are used. Since one of the goals of this study was 
to delineate clusters of medical schools which vary on meaning- 
ful dimensions, a limited number of combinations of factor 
scores were used and the results compared for interpretability . 

The first hierarchical cluster analysis performed was 
based on all eight factor scores. The results of this analysis 
(presented in Appendix B-l) seemed to indicate that the major 
element on which the clustering was based was the minority 
factor, and did not appear readily interpretable in terms of 
the clusters which were derived. At this point, therefore, two 
factor scores, Minority and Curriculum Electives, were dropped 
from the variable set. These two dimensions were considered 
less important than the remaining six in determining clusters 
of medical schools. 

The second hierarchical cluster analysis, based on six 
factor scores, resulted in potentially interesting groupings 
..of schools, and will be discussed in more detail below. 
However, to assess the impact of the control dimension on the 
clustering, the control factor score was dropped and the 
schools were clustered a third time on five factor scores 



* An investigation of the effects of missing and distorted data 
on solutior ; involving similarities among schools and 
alternative methods for compensating, for such effects is 
anticipated during the next phase of this series of studies. 
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(factors 1, 2, 5, 7, and 8).. The results of the hierarchxcal 
clustering based on five factor scores (presented _ xn Appendix 
B-2) did not appear to have any compelling qualities which 
made it inherently more meaningful than the six factor cluster- 
ing. A non-hierarchical cluster analysis based on the five 
factor scores was also performed. The results of that analysis 
are presented in Appendices C-l and C-2 for comparison with 
the results of the analysis based on six factor scores, the 
principal analysis described in this study. 

The results of the Ward hierarchical cluster analysis 
on six factor scores are presented in Figure 1. This tree- 
diagram, or dendrogram, depicts the merge sequence that 
developed in the analysis in 25 equal intervals. Each interval 
represents four percent of the total within cluster sum of 
squared distances at the final merge (when all schools were 
merged into a single cluster). From the diagram it is apparent 
that the majority of the combinations produce _ relatively _ little 
within cluster deviations from cluster centroids . The first 
92 merges, principally combinations of small groups _ of _ schools , 
accounted for only 25 percent of the total sum of within 
cluster deviations. By contrast, the final five merges _ 
accounted for over 40 perceritrof the increase in the criterion. 
On the basis of this information it was determined that the 
medical schools could probably be best represented by some- 
where between 5 and 17 groups. In other words, based on the 
information contained in Figure 1, representing the schools 
by as many as 17 clusters would leave schools which are 
relatively very similar in different clusters, but represent- 
ing the schools by as few as 5 clusters would force some 
schools into clusters in which they do not belong. 

Non-hierarchical Cluster Analysis 

Based on the results of the six factor hierarchical 
cluster analysis, an optimal solution was sought using 
Forgy's non-hierarchical cluster analysis method. The results 
of the hierarchical clustering were used to give an indication 
of the number of clusters which would represent the schools, 
and schools were selected as seedpoints for the non- 
hierarchical cluster analysis based on the hierarchical 
clusters. In the hierarchical cluster analysis and on 
further inspection of the data, one school, the Mayo Medical 
School, appeared so dissimilar from the other 109 schools 
that it was not included in further comparisons. 

The Forgy non-hierarchical cluster analysis technique 
complements the Ward hierarchical method by optimizing the 
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same criterion, the sum of the squared distances of the 
schools from the cluster centroids, but does not maintain 
the permanence of cluster membership inherent in the hier- 
archical methods. Several non-hierarchical solutions were 
obtained using varying numbers of clusters (12, 10, 8 and 6) 
and different sets of seedpoints (initial cluster centroids) . 
"The ~el.ght~ cluster solution was selected for presentation and 
description in this report. 

Figure 2 presents the composition of the eight clusters 
derived in the Forgy analysis and the profile of each cluster 
centroid on the six factor scores used in the analysis. The 
schools in each cluster are listed in the left hand column 
of the table, and the mean scores for the schools in the 
cluster on the six factors are graphed as cluster profiles. 

To aid in the interpretation and* understanding of the 
clusters, the means of the schools in each of the clusters 
on selected variables from the factor analysis are pre- 
sented in Table 3. In consideration of Table 3 it must be 
remembered that the factor scores were computed -or some 
schools which were missing data on some variables. As a 
result of that process, the means are computed based on the 
number of schools in a given cluster that had data for that 
particular variable. 

Cluster 1 , the first cluster depicted in Figure 2, is 
made up of 17 public medical schools, which are all 
established schools, but which, as a group, have no other 
distinguishing characteristics that can be seen in their 
cluster profile. The schools in Cluster 1 are below the 
average for all medical schools in emphasis on graduate 
medical education, development, research funding success, 
and research emphasis. The schools which form the cluster 
have an average enrollment of slightly over 500 undergraduate 
medical students, 95 percent of whom are from the state in 
which the school is located. These schools tend to be 
among the least expensive to attend {average tuition SI, 166) , 
and they have the smallest ratio of applicants per first year 
medical student of any of the eight clusters. 

The schools which combined to form Cluster 2 are, as a 
group, the oldest and largest of the 109 medical schools. 
Six of the 8 schools in the cluster are public schools with an 
average enrollment of 883 undergraduate medical students. 
These schools resemble the schools in Cluster 1 in that they 
do not place much emphasis on either graduate medical 
education or research, and their research funding success is 
slightly below average. The schools which make up Cluster I 



- 21- 



F I CURE 2 

CLUSTER MEMBERSHIP AND PROFILES OF CLUSTER 
CENTROIDS ON SIX FACTOR SCORES 
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FIGURE 2 
(Continued) 
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TABLE 3 

.MEW VALUES FOR EIGHT CLUSTERS OF II. S. MEDICAL SCHOOLS ON SELECTED VARIABLES 
FROM THE 33-VARIABLE FACTOR ANALYSIS OF INSTITUTIONAL DESCRIPTIVE*, 1976 
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may be characterized as having a high ratio of undergraduate 
medical students per full-time faculty, slightly below 
average resident tuition rates and ratios of applicants per 
first year medical students, and slightly above average 
amounts of total revenue. 

The 16 schools which comprise Cluster 3 are public 
schools which have a high degree of research emphasis and 
research funding success as opposed to graduate medical 
education emphasis. These schools are of moderate size and 
age and are not in the process of development, except in the 
area of research emphasis. The schools in this cluster 
experienced a 45 percent growth in DRG research support 
between 1966-67 and 1972-74, and they devote a relatively 
low percentage (6.5%) of their expenditures to administra- 
tion and general expense. 

Cl uster 4 consists of 14 medical schools which are well 
establTshed and have a strong graduate medical education 
program in addition to their undergraduate medical education 
program. These schools have an average of almos+ 650 
undergraduate medical students, but have a low ratio of 
undergraduate medical students per full-time faculty member. 
The comparative strength of the medical schools in this 
cluster is illustrated by the fact that Cluster 4 has 
highest mean values of the eight clusters on the following 
variables: average salary (strict full-time basic science 
associate professor) , percent of faculty with an M.D. -degree, 
R01 application approval rate, number of R01 applications 
approved by Initial Review Groups, and total revenue. In 
addition, the schools in Cluster 4 have second highest mean 
values of the eight clusters on ratio of housestaff (interns 
and residents) to undergraduate medical students, percent of 
living alumni who are board certified, and percent of total 
expenditures devoted to sponsored research. It is interest- 
ing to note, however, that an average of only 10 percent of 
the living alumni of the schools in Cluster 4 were in 
general practice. 

Cluster 5 is a group of primarily new medical schools 
which either are two-year schools or were not operating with 
full student bodies in 1974-75. The schools in this cluster 
had the lowest average enrollments, the lowest ratio of 
housestaff to undergraduate medical students , the lowest 
percentage of faculty holding M. D. -degrees , and correspond- 
ingly, the highest proportion of expenditures devoted to 
administration and general expenses. It may very well be 
that these five schools, as well as the Mayo Medical School, 



3? 

ERIC 



- 25 - 



may be so distinct that they are not representative of the 
general population of medical schools at the current time. 
The development of these schools and their changing patterns 
of similarity to the rest of the population may merit special 
consideration as the schools become established. 

By comparison, the schools in Cluster 6 are relatively 
new, public medical schools which are currently experiencing 
rapid development. While they are below average in size and 
age and research emphasis, the schools have had a moderate 
degree of research funding success and have slightly above 
average emphasis on graduate medical education. The most 
notable characteristic of the schools in this cluster is 
that they have the highest average values for both change 
in federal research support and projected change in enroll- 
ment. These schools have the lowest average in-state tuition, 
enroll over 93 percent in-state undergraduate medical students, 
and have the third-highest ratio of applicants per first 
year medical student. In addition, they utilize relatively 
more volunteer faculty than any other group of schools and 
devote almost as much of their total expenditures to admin- 
istration and general expenses (14.33 percent) as to spon- 
sored research (14.66 percent). 

The final two clusters are composed of established, largely 
private schools with almost complementary profiles in other 
respects. The schools in Cluster 7 are slightly above average 
m size and age and have a moderately high degree of research 
funding success, but place low emphasis on graduate medical 
education and research compared to other medical schools. 
The schools in this cluster tend to be of average size, but 
have the lowest average total revenues of the clusters of 
'established schools. As a group, these schools are the most 
expensive to attend* enroll the fewest undergraduate medical 
students from the states in which they are located, and have 
the highest number of applicants per enrolled first year 
medical students of any of the clusters. 

The schools in Cluster 8 , by way of contrast, have 
strong emphasis for both research and graduate medical 
education, but tend to have slightly fewer undergraduate 
medical students and slightly less research funding success 
than the average school. The schools in Cluster 8 have the 
highest ratio of housestaff to undergraduate medical students 
and the lowest ratio of undergraduate medical students to 
full-time faculty of all clusters. They also have the second 
highest average total revenue of all clusters and receive the 
highest proportion of their revenues from the federal govern- 
ment of any of the clusters. 
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The preceding paragraphs describe the eight clusters 
which were derived in the course of the current study. 
However, not every school fits equally well into the cluster 
in which it is a member. One measure of how well a school 
fits into a cluster is the distance from the school to the 
cluster centroid. The membership of the clusters and the 
distance of each school from the cluster centroid is presented 
in Table 4. In examining Table 4 it should be remembered 
that the schools are in the cluster with the closest centroid 
and that one of the basic assumptions of cluster analysis is 
that all objects (schools) are placed into one of the clusters. 
As a result, the clusters vary in the degree of homogeneity, 
or similarity, of the schools which they contain. Three of the 
clusters (numbers 1, 2, and 4) appear from the information 
in Table 4 to be reasonably homogeneous. The remaining five 
clusters each tend to have several schools close to the 
cluster centroid with a smaller number of the periphery of 
the cluster. 

In general it is evident that the clustering described 
in this report reflects principally the size, age, and control 
of the schools. These characteristics were also evident in 
earlier studies (Nunn and Lain, 1976; McShane, 1977a). 
Differences in the composition of the clusters were the 
result of the changes in the variables selected, changes 
in the qualitv of the data used, and changes in the schools 
over time. It should be remembered that the current series of 
studies are exploratory in nature, and while the study des- 
cribed in this report represents an advancement over the 
previous studies it is only one of an infinite complex of 
possible solutions. 
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TABLE 4 

MEMBERSHIP OF EIGHT CLUSTERS OF U.S. MEDICAL SCHOOLS IN ORDER 
OF DISTANCE FROM CLUSTER CENTROID RESULTING FROM 
CLUSTER ANALYSIS OF SIX FACTOR SCORES, 1976 



School 



Distance 



School 



Distance 



School 



Distance 



CLUSTER 1 



CLUSTER 4 



MARYLAND 




.1386 


PITTSBURGH 




7754 


LOUISIANA NW ORL 




.2262 


MIAMI 




,8758 


M.C. OF VIRGINIA 




.4092 


TEXAS SOUTHWEST 


1. 


1703 


TENNESSEE 




.4639 


NEW YORK UNIV 


1. 


2367 


LOUISVILLE . 




.5242 


EINSTEIN 


1. 


3062 


OHIO 




.5584 


COLUMBIA 


1. 


3952 


GEORGIA • 




.5696 


SUNY UPSTATE 


1. 


6207 


ARKANSAS 




.5788 


MINN-MINNEAPOLIS 


1. 


6357 


MISSISSIPPI 




.7469 


U OF MICHIGAN 


1. 


9033 


OREGON 




.7777 


CALIF SAN FRAN 


2. 


2119 


SOUTH CAROLINA 




.8160 


HARVARD 


2. 


2875 


KENTUCKY 


1 


.0929 


NEW JERSEY 


2. 


6667 


NEBRASKA 


1 


.3232 


NEW YORK MED 


3. 


2606 


SOUTH ALABAMA 


1 


.9552 


CALIF L A 


3. 


8145 


RUTGERS 


2 


.0372 






OKLAHOMA 


2 


.1762 


CLUSTER 5 






SOUTH DAKOTA 


2 


,6781 







CLUSTER 2 

TEMPLE 
WAYNE STATE 
SUNY BUFFALO 
SUNY DOWN STATE 
TEXAS GALVESTON 
INDIANA 
ILLINOIS 
JEFFERSON 

CLUSTER 3 

UTAH 

FLORIDA 

KANSAS 

COLORADO 

NORTH CAROLINA 

WISCONSIN 

MEW MEXICO 

U OF VIRGINIA 

IOWA 

MISSOURI-COLUMB 
WEST VIRGINIA 
ALABAMA- B I RMNGHM 
ARIZONA 

U OF WASH SEATTL 
CONNECTICUT 
PUERTO RICO 



.8109 
1.3371 
4050 
4777 
7632 
2552 
6031 



2.6975 



.3168 
.5255 
.7351 
.9751 
1.1698 
1.2245 
1.3239 
1.3523 
1.3950 
1.4144 
1.7121 
1.9385 
1.9873 
2.4595 
4.5207 
8.0911 



NEVADA 

SO. ILLINOIS 

MINN-DULUTH 

MISSOURI K.C. 

CHICAGO MEDICAL 

CLUSTER 6 

M.C. OHIO TOLEDO 
TEXAS SAN ANTON 
SOUTH FLORIDA 
CALIF DAVIS 
LOUISIANA SHRVPT 
PENN STATE 
TEXAS TECH 
MASSACHUSETTS 
MICHIGAN STATE 
CALIF SAN DIEGO 
CALIF IRVINE 
TEXAS HOUSTON 
SUNY STONY BRK 



2.0703 
2.2886 
2.5629 
3.1050 
4.7853 



1.0706 
1.2566 
1.5081 
1.6373 
2.6802 
2.9570 
3.9943 
4.0861 
5.0520 
5.0919 
7.0285 
17.7744 
20.8547 



CLUSTER 7 

BOWMAN GRAY 

TULANE 

ALBANY 

HAHNEMANN 

BOSTON 

NORTHWESTERN . 

CREIGHTON 

HOWARD 

ST LOUIS 

GEORGETOWN 

MEHARRY 

VERMONT 

TUFTS 

BROWN 

GEORGE WASH 
LOMA LINDA 
LOYOLA 
DARTMOUTH 
M.C/OF PENN. 

CLUSTER 8 

DUKE 

STANFORD 

WASH U ST LOUIS 

SOUTHERN CALIF 

ROCHESTER 

EMORY 

CASE WESTERN RES 

U OF PENN. 

VANDERBILT 

CORNELL 

JOHNS HOPKINS 

CINCINNATI 

YALE 

MC OF WISCONSIN 
U OF CHICAGO 
MT SINAI 
RUSH MED COL 



.5787 
.7235 
.7534 
.9137 
1.0506 
1.3305 
1.6183 
1.6467 
1.6894 
1.9770 
2.1976 
2.4396 
2.7253 
3.4968 
4.0908 
5.9505 
6.9853 
8.5947 
11.0754 



.4389 
.6789 
.7999 
.8573 
.8654 
.9160 
1.3179 
1.5335 
1.6652 
1.7411 
1.8317 
2.0827 
2.1735 
2.1912 
2.3764 
6.4374 
8.2129 
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Chapter IV 
SUMMARY AND CONCLUSIONS 



The study described in this report applied methods 
developed in earlier clustering studies performed at AAMC to 
a different set cf variables and produced eight clusters of 
medical schools. A total of 140 variables were extracted 
from the IPS Researchable Data Base. Through a series of 
correlational studies this number was reduced to a final data 
set of 33 variables representing several of the measurable 
dimensions in the data maintained -in IPS. The 33 variables 
were factor analyzed, eight factors were rotated using a 
varimax criterion, and factor scores for the eight factors 
were computed for 110 U.S. medical schools. The schools 
were then grouped using two techniques sequentially; Ward's 
hierarchical cluster analysis was used initially to give an 
indication of the potential number of clusters and initial 
cluster centers, and a non-hierarchical cluster analysis 
technique developed by Forgy was subsequently used to refine 
the cluster memberships. A number of cluster analyses of 
both types were performed, varying the set of factor scores 
input and the number of clusters derived. A final solution 
which produced eight clusters based on six factor scores was 
selected as the most representative solution based on the 
selected data. 

The factor analysis resulted in eight factors describing 
the following dimensions: (1) graduate medical education 
emphasis, (2) size and age, (3) control, (4) minority partici- 
pation, (5) research funding success, (6) curriculum electives 
(7) development stage, and (8) research emphasis. Clustering 
110 medical schools on six of these factor scores (factors 1, 
2, 3, 5, 7, and 8) yielded eight clusters that represented 
reasonably homogeneous groupings of schools. However, each 
cluster retained distinguishing characteristics that allowed 
for representation of the group of schools as different from 
the other seven groups. 



Conclusions and Recommendations 

While the study described in this report does represent 
a step forward from the earlier AAMC cluster analysis studies, 
it represents only one possible solution based on a particular 
selection of variables. There are a number of methodological 
and application issues that need to be given consideration 
be'^e a solution representing the most coherent possible 
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set of clusters of medical schools can be obtained. Among the 
issues which merit consideration are the limitations of the 
data, the impact of missing and non-representative data on 
the solutions, the selection of variables, and the criteria 
for including schools in the analysis. 

The first of these issues, the limitations of the data, 
may be the final limitation on the utility of studies of 
this type in this area. The data in IPS are largely self- 
reported by the schools or extracted from other systems 
where it is self-reported by faculty members, students, 
applicants, or alumni. As a result, the data are only 
useful if they are reported completely and accurately, 
if the data collected are meaningful, and if the data are 
collected in such a way that they are comparable across 
institutions. Efforts aimed at enhancing the quality of 
some of the data in IPS in the ways noted above are being 
undertaken. If the data in the system are to be optimally 
useful in the context of studies of this type, or any other 
context, these efforts must be maintained where they now 
exist and increased wherever possible. 

The second issue, the impact of missing and non-repre- 
sentative data on clustering and scaling solutions, is a 
methodological problem the effects of which have not been 
completely ascertained. By using factor analysis and com- 
putinq factor scores, the effects of missing data are somewhat 
obscured and may be compensated adequately. However, in the 
course of these studies it has become apparent that the 
effects of missing and non-representative data may be Skater 
than previously anticipated. The degree of impact of missing 
and distorted data on cluster analysis and scaling solutions, 
especially when the original variables are used, should be 
determined and a method of compensating for these effects 
should be developed. 

The selection of variables also plays an important role 
in studies of this type. As noted earlier, the measures of 
similarity used in studies of this type are extremely 
sensitive to the variables on which they are based. Small 
changes in the variables selected may have considerable effect 
on the solutions. Variable selection is even more important 
in a situation, such as that with medical schools, m which 
all of the members of the population of interest are included 
in the analysis. The problem is one of sampling variables 
for analysis from the universe of variables, rather than 
sampling subjects. 2* studies of this type, the ramifications 
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of sampling variables should be investigated. It may be that 
some alternative analytic technique , such as Alpha factor 
analysis (Kaiser and Caffrey, 1965), would serve better in the 
selection of variables. 

In this stud^/^ariables were selected on the basis of 
their .potential for revealing dimensions not previously 
described among medical schools in addition to their com- 
pleteness and representativeness. The effects of the vari- 
able selection^are apparent in both the factor analysis and 
in the results" of-the cluster analysis. It would seem to be 
an appropriate next step to combine the knowledge gained from 
< these results with that of previous studies to select , 
* possibly with the aid of Alpha factor analysis , a new set 
of variables representing the universe of data in IPS. 

The final issue , that of the criteria for including 
schools in the analysis , is one which affects the application 
of these techniques to institutional data. There are two 
possible reasons for excluding schools from, analysis, one 
being a high proportion of missing data, and the other being 
that a school is highly dissimilar to all other schools in 
the analysis. In this study, seven schools were excluded for 
the former reason, one for the latter. While the amount of 
missing data that exists primarily affects the degree to which 
data are representative of a school, the inclusion of schools 
which are highly unlike other schools affects the analysis 
itself. one of the underlying assumptions of cluster analysis 
is that all of the objects submitted to analysis will be 
placed in one of the clusters. The inclusion of outlying 
objects causes some distortion in the clusters and could 
possibly affect the cluster solutions in other ways. The 
desire to cluster as many schools as possible should be 
weighed against the effects. of outlying schools on the 
cluster solution. 

* * 

In addition to the issues discussed above, there is 
also a need for further investigation into the clustering 
techniques themselves. These issues are more methodological 
than those discussed above, and include alternative methods 
of computing similarity among schools and of ascertaining the 
starting points for non-hierarchical cluster analyses. In 
the studies performed by AAMC to date the Euclidean metric 
has been used to compute similarities. While this method 
may be accurate and robust, there may be an alternative 
method, such as the "city-block 11 metric, which would depict 
the particular data under consideration differently and 
render more meaningful results. Similarly, for the starting 
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points for non-hierarchical clustering, the use of repre- 
sentative schools may be an adequate method of specifying 
initial starting points, but it is also possible that some 
other method, such as using randomly selected schools or^ 
specific centroid coordinates, would be more applicable in 
the current context. 

In conclusion, the results of this study achieve the 
goals of clustering medical schools on the selected data. 
They provide a new and different basis for looking a'; medical 
schools and how they are similar to one another. There are 
several factors, however, that place limits on the universality 
of these results. 
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APPENDIX A 

Abbreviations Used in 1976 
Researchable Data Base Variable Labels 



? 


Dollars 


§ 


Number 


n. 
15 


Percent 


* ung 


Percent Change 


A— Heaitn 


Allied Health 


Accel 


Accelerated 


TV 


Avcite , Activity 


A dm 


Administration 


Admin & Genl 


Administration & General 


Aamt: 


Admitted 


A am Frer 


Admittance-Preference 


Aau stag 


Advanced Standing 


TV T^f* 

AJ3C 


Atomic Energy Commission 


Atril 


Affiliated 


Agrmt 


Agreement 


Alum 


Alumni , Alumnae 


AlilG r 


American 


TV ml 

Amt 


Amount 


Ann 1 


Annual 


App 


Applications , Applicant 


Applicnts 


Applicants 


Apply 


Applying 


Appr 


Appropriations 


Assist 


Assistant (ASST) 


Assoc 


Associate 


Avail 


Available 


Av 


Average 


BA 


Bachelor of Arts 


Bas 


Basic (Sciences) 


Bal 


Balance 


BHRD 


Bureau of Health and Resources 


BMS 


Development 


Basic Medical Sciences 


BS 


Bachelor of Science 


Budg 


Budget (ed) 


Bus & Ind 


Business and Industry 


Ch 


Choice 


Chg 


Change 


Clin 


Clinical (Sciences) 
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APPENDIX A (Continued) 



Coll College 

Comm Comni ttee 

Comp Competing 

Con$ Constant Dollars (adjusted for 

inflation) 

Curr Curriculum 

D e f Deficit 

Deg Degree 

Dept Department (al) 

DHEW Dept. of Health, Education and Welfare 

Diff Difference 

Dir Direct 

Di sadv Di. sadvarftaged 

Etist Distributed - 

DOD Dept of Defense 

DRG Division of Research Grants (NIH) 

Ed Education , Educational (Educ) 

Elec . Electives 

Emerg-Med Emergency Medicine 

Endow Endowments 

Enroll Enrollment 

Equivs Equivalents 

Exp Expenditures (Expd) 

Fac Faculty 

Facil Facility 

Fed Federal 

Fern Female 

Yin Financial 

Fin-Yr Final Year 

p MG Foreign Medical Graduate 

p r From 

FT Full Timec 

Gen General 

Govt Government 

GPA Grade Point Average 

Grad Graduate 

GT Greater than 

jj|yjQ Health Maintenance Organization 

IMP AC DRG 1 s computer file of grants & 

contracts 
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APPENDIX A (Continued) 



Incl 


Including 


Indir 


Indirect (Ind) 


Innov 


Innovations 


Instr 


Instructor 


Instrct 


Instructional 


Intm 


Interns 


IRG 


Initial Review Group (study section) 


LCME 


Liaison Committee on Medical Education 


Liv 


Living 


Log 


Logarithm 


LT 


Less Than 


Matric 


Matriculant 


MCAT 


Medical College Admissions Test 


MD-Stud 


Medical Student 


Med 


Medical 


Med-Sch 


Medical School 


Mid-Yr 


Middle Y fc ear 


Min 


Minority 


Mnlnd 


Mainland 


MS 


Master's degree 


Multi-Purp 


Multi-Purpose (MP) 


Multi-Serv 


Multi-Service 


NBME-1 


National Board Medical Examiners 




(test) - Part I 


NBME-2 


National B'oard of Medical Examiners - 




Part II 


NIH 


National Institutes of Health 


NIMH 


National Institute of Mental Health 


Non— Govt 


Noil— fifiVPT'nmon fa 1 


'Non-Res 


Non-Resident 


NSF 


National Science Foundation 


Oper & Maint 


Operation anH Mainf*pn3Tirp 


Org 


Organized/ Organizational 


Outpat 


Out patient 


P-Scr 


Priority Score 


P01 


Program and Project Grants 


Phys 


Physical 


Pop 


Population 


Pos 


Position 


Post-Docs 


Post-Doctorates 


Post-Grad 


Post-Graduates 


Prac 


Practice 




44 



ERIC 



APPENDIX A (Continu3d) 



Pre-Med 

Priv 

Prof 

Prog 

Projtd 

PT 

Pub , 

Quant 

R01 

Rat 

Rec 

Recov 

Reg Oper Expd 

Rel 

Res 

Resrv 

Ret 

Rev 

Rsdnt 

Sal 

SBMT 

Sch 

Sci 

SD 

Sep 

Serv 

SFT 

SMS A 

Spec 

Spons 

Sq 

St & Loc 

St Rel 

Std 

Stud 

Tch-Trn 

Tchng 

Tot 

Undergrad 



Pre - Medical 

Private 

Professional 

Program. (Pgm) 

Projected 

Part Tine 

Public 

Quantitative 

Traditional Research Grants 

Ratio 

Received 

Recovery (RCOV) 

Regular Operating Expenditures 

Related 

Research 

Reserves 

Retention 

Revenues 

Resident 

Salary 

Submitted 

School 

Science 

Standard Deviation 

Separately 

Service 

Strict Full Time 

Standard Metropolitan Statistic 

Area 

Special, Specialty 

Sponsored 

Square 

State and Local (S&L) 
State Related 
Standardized 
Student 

Teaching and Training 

Teaching 

Total 

Undergraduate (Ungrad, UG) 
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APPENDIX A (Continued) 



Underrep Under-represented 

Jj nk Unknown 

|Jl? restr Unrestricted 

^: Can United States and Canadian 

* o1 Volunteer 

Yr Year 



46 

ERIC 



15 16 1? 18 '9 




07.TEM NAME 
Q GEORGIA 
Q TENNESSEE 
Q SOUTH CAROLINA 
Q M.C OF VIRGINIA 
Q OREGON 
Q MARYLAND 
Q OHIO 
Q NEBRASKA 
Q OKLAHOMA 

Q TEMPLE 

Q WAYNE STATE 

Q INDIANA 

Q SUNT BUFFALO 

Q ILLINOIS 

Q GEORGE WASH 

Q LOYOLA 

3 ST LOUIS 

Q TULA ME 

Q HATHWANN 

Q GEORGETOWN 

Q BOSTON 

Q SOUTHERN CALIK 
Q ALBANY 
Q EMORY 

Q MC OF WISCONSIN 
Q ARKANSAS 
Q KANSAS 

Q MISSOURI-COLUMB 
Q IOWA 

Q LOUISIANA NW ORL 
0 MISSISSIPPI 
Q WEST VIRGINIA 
Q LOUISVILLE 
Q SOUTH DAKOTA 
Q CHICAGO MEDICAL 
Q LCMA LINDA 
0 BOWMAN GRAY 
Q VERMONT 
Q CHEIGHTON 
Q PITTSBURGH 
Q U OF VIRGINIA 
Q 3UNY UPSTATE 
Q TEXAS GALVESTON 
Q TEXAS SOUTHWEST 
Q JEFFERSON 
Q NORTHWESTERN 
Q SUNY DOWN ST ATE 
Q NEW YORK MED 
Q TUFTS 
S BROWN 

Q M.C. OF PENN. 
Q DARTHOUTH 
Q SUNY STONY BRK 
Q CALIF IRVINE 
Q TEXAS HOUSTON 
0 MISSOURI k.c. 
Q SO. ILLINOIS 
Q MINN DULUTH 
Q NEVATA 
Q PUERTO RICO 
Q SOUTH FLORIDA 
Q SOUTH ALABAMA 
Q TEXAS TECH 
Q LPUI31ANA SHRVPT 
Q CONNECTICUT 
Q M.C. OHIO TOLEDO 
Q MASSACHUSETTS 
Q CALIF SAN DIEGO 
Q PENN STATE 
Q MICHIGAN STATE 
Q U OF WASH 5EATTL 
Q WISCONSIN 
Q ALABAMA-BI PMNGHM 
q COLORADO 
Q FLORIDA 
Q UTAH 

Q NORTH CAROLINA 
Q KENTUCKY 
C RUTGERS 
NEW JERSEY 
ARIZONA 
CALIF DAVIS 
TEXAS SAN ANTON 
NEW MEXICO 
Q CINCINNATI 
Q MIAMI 

Q NEW YORK UNIV 
Q CASE WESTERN RES 



APPENDIX B-! 

WARD HIERARCHICAL CLUSTER ANALYSIS OF 110 «.5. MEDICAL 
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01 TEH SAME 

Q U OF CHICAGO 

Q ROCHESTER 

Q VANDERBILT 

Q ALA8AM A-3I RMNGHM 

Q U OF WASH SEATTL 

Q YALE 

Q IOWA 

Q WISCONSIN 

Q ALBANY 

Q BOWMAN CRAY 

Q UTAH 

Q VERMONT 

Q TUFTS 

Q MISSOURI- COLUMB 
Q WEST VIRGINIA 
Q FLORIDA 
EMORY 
ARKANSAS 
KANSAS 

cincinnati 
southern calif 

BOSTON 
OREGON 

MC OF WISCONSIN 
DUKE 

STANFORD 
JOHNS HOPKINS 
PITTSSURCH 
U OF VIRGINIA 
CASE WESTERN RES 
NORTH CAROLINA 
WASH U ST LOU IS 
COLORADO 
MIAMI 

columbia 
new york un iv 
einstein 
harvard 
q cornell 
q u of michigan 
q minn minneapolis 
q u of penn. 
q texas southwest 
q calif sak fran 
q calif l a 
q meharry 
q south dakota 
3 c heighten 
q mississippi 
q hahnemann 

Q TULANE 

Q HOWARD 

Q GEORGIA 

Q LOUISVILLE 

Q TENNESSEE 

0 LOUISIANA NW ORL 

Q MARYLAND 

Q OHIO 

Q M.C. OF VIRGINIA 
Q SOUTH CAROLINA 
Q NEBRASKA 
Q ST LOUIS 
Q LOMA LINDA 
8 OKLAHOMA 
Q SUNY nUFFALO 
QCEORGE WASH 
Q LOYOLA 
Q SUNY UPSTATE 
Q TEXAS GALVESTON 
Q NORTHWESTERN 
Q TEMPLE 
Q GEORGETOWN 
Q 3UNY DQWNSTATE 
Q NEW YORK MED 
C INDIANA 
Q WAYNE STATE 
Q ILLINOIS 
0 JEFFERSON 
0 MISSO:. M K.C. 
Q MINN t.' MJTH 
0 CIIICAGu -EDICAL 
Q NEVADA 
0 SO. ILLINOIS 
Q MT SINAI 
Q RUSH MED COL 
MAYO 

CALIF IH'VIUE 

"Texas neosroff" 

NEW MEXICO 
ARIZONA 
CONNECTICUT 
CALIF SAN DIEGO 
PENN STATE 
TEXAS SAN ANTOS 
CALIF DAVIS 
LOUISIANA SHRVPT 
MICHIGAN STATE 

OHIO TOLEDO 
SOUTH FLORIDA 
MASSACHUSETTS 
NEW JERSEY 
0 RUTGERS 
0 KENTUCKY 
Q TEXAS TECH 
C SOUTH ALABAMA 
3 PUERTO RICO 
0 9R0WN 
0 DARTMOUTH 
C M.C. OF PENN. 
0 SUM STONY BRK 
OITEM NAME 



61 
76 
I 

81 
89 
30 
86 
Z 

A 

77 
7 J 

m 

85 



3 
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65 



18 

67 
32 
bO 
79 

52 
B2 
12 
39 
13 
51 
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15 
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71 
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APPENDIX C-l 



CLUSTER MEMBERSHIP AND PROFILES OF CLUSTER 
CENTROIDS FROM CLUSTER ANALYSIS OF FIVE FACTOR SCORES. 1976 



MEMBERSHIP 



brown, tufts, 
massachusetts, 
south florida, 
m.c.ohio 10leuq, 
Connecticut, 

.c. or pf.sn. , 

unruum, 
puerto rico, 
mich 1 can state, 
suny-stony brk 




COLORADO, FLORIDA, 
RCCHLSTF.R, EMORY, 
VAKDERiUI.T, UTAH, 
U. OF CHICAGO, YALE, 
CJNCIWII, MIKE, 
WISCONSIN, VERMONT, 
NFV! MEXICO, KANSAS, 
STANFORD, SOUTHERN 
CALIF., BOSTON, ARIZONA, 
NORTH CAFOLINA,MISSOf".' ! 
COLUMBIA, WASH. U. ST. 

° . j lou i s , AU&v-: i 'x lnchm , 

' JOVJA, MC OF WUCOSSIN, 
V OF WASH. SP.A'ITL 



u.of pf.nn. , 
new york univ., 
texas-southwest , 
mi am; ,n::?TEiN, 

tr. 01- MICHIGAN, 
MINN-MINNHAT'OLIS, 
COLLI M i A • CALIF. - 
SAN FRAN. .CORNELL, 
JOHNS HOPKINS, 
CALIF. L.A. 



v:.vi.v:o, 



HAHNEMANN, SOUTH 
DAKOTA, ST. LOUIS. 
CRK1CHT0N, TV! 
LnUlfitANA 

wr:.::?/.s, 
Mirsissin i, iK:v 
Lonsviur, r.ix::;;i-\, 

BOWMAN CRAY, SClMti 
CAROLINA, rSCHAKiiY , 
OKlJUiOMA, SOUTi! 
ALABAMA , LOMA LINDA, 
WEST VIRGINIA* 
NEBRASKA 



LO 



HI 



MD 



GRADUATE 
MEDICAL 
EDUCATION 
PROGRAMS 



MD 



LO 



HI 



MD 



LO 




SIZE AND 
AGE 



RESEARCH 

FUNDING 

SUCCESS 



DEVELOPMENT 
STAGE 




RESEARCH 
EMPHASIS 
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APPENDIX C-2 

MEMBERSHIP OF EIGHT CLUSTERS OF U.S. MEDICAL SCHOOLS IN ORDER 
OF DISTANCE FROM CLUSTER CEMTROID BASED ON 
CLUSTER ANALYSIS OF FIVE FACTOR SCORES 



School 

CLUSTER 1 

BROWN 
TUFTS 

MASSACHUSETTS 
SOUTH FLORIDA 
MC. OHIO TOLEDO 
CONNECTICUT 
M.C. OF PENN. 
DARTMOUTH 
PUERTO RICO 
MICHIGAN STATE 
SUNY STONY BRK 

CLUSTER 2 

COLORADO 
FLORIDA 
ROCHESTER 
EMORY 

NORTH CAROLINA 
UTAH 

U OF CHICAGO 
WASH U ST LOUIS 
DUKE 

WISCONSIN 

KANSAS 

IOWA 

VANDERBILT 

STANFORD 

BOSTON 

A LA B AMA- B I RMNG HM 
MC OF WISCONSIN 
SOUTHERN CALIF 
U OF WASH SEATTL 
MISSOURI-COLUMB 
NEW MEXICO 
YALE 

CINCINNATI 

VERMONT 

ARIZONA 

CLUSTER 3 

U OF PENN. 

NEW YORK UNIV 

TEXAS SOUTHWEST 

MIAMI 

EINSTEIN 

U OF MICHIGAN 

MINN-MINNEAPOLIS 

COLUMBIA 

CALIF SAN FRAN 

JOHNS HOPKINS 

CORNELL 

CALIF L A 



Distance 



2914 
3060 
6796 
8369 
3469 
,7556 
,0012 
,1798 
,1041 
.0551 



14.2849 



. 3258 
.4116 
.4451 
.5268 
.5641 
.6734 
.6734 
.8177 
.8249 
.8673 
.9193 
.9201 
.9998 
.0871 
.1410 
.2461 
.5048 
.5793 
1.6304 
1.8118 
1.8349 
1.8425 
1.9762 
2.3603 
2.5346 



.1021 
.4211 
.5455 
.6465 
.8548 
.9073 
1.2214 
1.2418 
1.3610 
1.5516 
1.6007 
2.0233 



School 

CLUSTER 4 

HAHNEMANN 
ST LOUIS 
TULANE 
CREIGHTON 
ARKANSAS 

LOUISIANA NW ORL 

HOWARD 

LOUISVILLE 

MARYLAND 

MISSISSIPPI 

GEORGIA 

NEBRASKA 

BOWMAN GRAY 

MEHARRY 

SOUTH CAROLINA 

SOUTH ALABAMA 

OKLAHOMA 

WEST VIRGINIA 

SOUTH DAKOTA 

LOMA LINDA 

CLUSTER 5 

NORTHWESTERN 
OHIO 

GEORGETOWN 
TEMPLE 

M.C. OF VIRGINIA 

TENNESSEE 

JEFFERSON 

SUNY BUFFALO 

OREGON 

WAYNE STATE 

INDIANA 

SUNY DOWNSTATE 
ILLINOIS 
GEORGE WASH 
LOYOLA 

CLUSTER 6 

PITTSBURGH 

SUNY UPSTATE 

CASE WESTERN RES 

U OF VIRGINIA 

RUTGERS 

KENTUCKY 

TEXAS GALVESTON 

ALBANY 

NEW JERSEY 

HARVARD 

NEW YORK MED 



Distance 



.3818 
. 4429 
.4914 
.5009 
. 5022 
.5370 
.5607 
.6337 
.6714 
.7541 
. 8673 
. 8938 
1.0283 
1.1973 
1.2167 
1.5677 
1.7851 
1.8382 
1.8838 
2.9749 



. 2354 
. 3756 
.4439 
.4599 
. 4974 
.7677 
.9759 
1.0207 
1.0271 
1.2812 
1.6136 
2.1479 
3.6354 
4.9173 
5.6293 



.1649 
.4070 
.5244 
.7098 
1.0170 
1.0641 
1.1260 
1.1471 
1.2880 
1.6775 
1.8277 



School 

' CLUSTER 7 

SO. ILLINOIS 
LOUISIANA SHRVPT 
NEVADA 
TEXAS TECH 
MISSOURI K.C. 
MINN-DULUTH 
CHICAGO MEDICAL 
RUSH MED COL 
MT SINAI 

CLUSTER 8 

TEXAS SAN ANTON 
CALIF DAVIS 
PENN STATE 
CALIF SAN DIEGO 
CALIF IRVINE 
TEXAS HOUSTON 



Distance 



1.4077 
2.1350 



6368 
7573 
9510 
3004 
,7453 
,6961 
.8062 



.0116 
.2052 
.6666 
.8082 
.8530 
11.7976 
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